Dataset Description

Column

The focus of our dashboard is to explore the interactions between multiple factors collected from NCAA Basketball teams this season. We gathered our data from two online sources. Our first data set was web scraped from “kenpom.com” and our second data set was downloaded from “sports-reference.com”. We merged the two datasets on the name of the college in order to increase our number of variables, allowing us to have more avenues to go down when doing our analysis. Our final data set has 41 variables and 362 observations.

Varaible Variable Description
Team Team Name
Rk National Rank
Conf Conference
Wins Number of Games Won
Losses Number of Games Lost
EM Efficiency Margin
OE Offensive Efficiency
DE Defensive Efficieny
Tempo Tempo (amount of possesions per game)
Luck Luck Rating
SOS Strength of Schedule
OppO Opposition Offensive Efficiency
OppD Opposition Defensive Efficiency
NCSOS Non-Conference Strength of Schedule
Win_Loss_Percentage Win-Loss Percentage
Conference Wins Number of Conference Games Won
Conference Losses Number of Conference Games Lost
Home_W Home Games Won
Home_L Home Games Lost
Away_W Away Games Won
Away_L Away Games Lost
Points_For Total Points Scored
Points_Against Total Points Given Up
MP Minutes Played
FG Field Goals Made
FGA Field Goals Attempted
FG_Perecntage Field Goal Percentage
3P 3-Pointers Made
3PA 3-Pointers Attempted
3P_Percentage 3-Pointer Percentage
FT Free-throws made
FTA Free-throws Attempted
FT_Percentage Free-throw Percentage
ORB Offensive Rebounds
TRB Total Rebounds
AST Assists
STL Steals
BLK Blocks
TOV Turnovers
PF Personal Fouls
NCAA_Tourney Made the NCAA Torunament

MLR

Column

Research Question

What factors contribute to the number of wins a team achieves in college basketball, and how accurately can a multiple linear regression (MLR) model predict the win count based on these factors?

Fitting a MLR with all predictors

   Rk           Team Conf Wins Losses    EM    OE    DE Tempo   Luck   SOS
4   4         Auburn  SEC   27      8 27.99 120.4  92.4  70.0 -0.080  9.49
5   5      Tennessee  SEC   27      9 26.61 116.8  90.2  69.3 -0.026 13.35
7   7           Duke  ACC   27      9 26.47 121.6  95.2  66.4 -0.064 10.07
9   9 North Carolina  ACC   29      8 26.19 119.7  93.5  70.6 -0.038 12.17
14 14        Alabama  SEC   25     12 22.96 126.0 103.0  72.6 -0.001 14.71
19 19        Clemson  ACC   24     12 19.44 117.7  98.3  66.4 -0.018 12.09
    OppO  OppD NCSOS
4  111.9 102.4  1.47
5  114.6 101.2  8.97
7  111.1 101.1 -0.04
9  112.6 100.5  6.99
14 115.1 100.4  9.46
19 113.5 101.4  4.91

Call:
lm(formula = Wins ~ ., data = predictive_data)

Residuals:
    Min      1Q  Median      3Q     Max 
-2.2478 -0.5713 -0.1385  0.6979  2.0755 

Coefficients:
             Estimate Std. Error t value Pr(>|t|)   
(Intercept) 19.591099  49.696734   0.394  0.69833   
Rk          -0.026090   0.022720  -1.148  0.26674   
Losses       0.184998   0.337237   0.549  0.59043   
EM           0.651338  10.265037   0.063  0.95015   
OE          -0.004057  10.280409   0.000  0.99969   
DE          -0.068121  10.285411  -0.007  0.99479   
Tempo        0.074010   0.125028   0.592  0.56167   
Luck        32.999028  10.587589   3.117  0.00627 **
SOS          1.950460   7.841266   0.249  0.80654   
OppO        -1.915704   7.705578  -0.249  0.80664   
OppD         1.845868   7.827056   0.236  0.81638   
NCSOS       -0.232496   0.127219  -1.828  0.08523 . 
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 1.267 on 17 degrees of freedom
Multiple R-squared:  0.9708,    Adjusted R-squared:  0.952 
F-statistic: 51.47 on 11 and 17 DF,  p-value: 7.883e-11
          Rk       Losses           EM           OE           DE        Tempo 
2.331805e+01 4.140231e+01 1.261413e+05 5.916812e+04 3.688691e+04 2.209332e+00 
        Luck          SOS         OppO         OppD        NCSOS 
8.479177e+00 2.539884e+03 1.434161e+03 6.217872e+02 4.570716e+00 

Reducing the model to elimate collinearity between predictors


Call:
lm(formula = Wins ~ OE + DE + Tempo + Luck + OppO + OppD + NCSOS, 
    data = predictive_data)

Residuals:
     Min       1Q   Median       3Q      Max 
-2.23624 -0.83358  0.04624  0.91181  1.89847 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept) 26.96367   45.13411   0.597   0.5566    
OE           0.71375    0.05234  13.637 6.66e-12 ***
DE          -0.76229    0.06221 -12.254 4.94e-11 ***
Tempo        0.05677    0.11074   0.513   0.6136    
Luck        29.34739    3.70023   7.931 9.45e-08 ***
OppO         0.15818    0.29564   0.535   0.5982    
OppD        -0.32472    0.37518  -0.865   0.3966    
NCSOS       -0.25582    0.10549  -2.425   0.0244 *  
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 1.194 on 21 degrees of freedom
Multiple R-squared:  0.968, Adjusted R-squared:  0.9574 
F-statistic:  90.8 on 7 and 21 DF,  p-value: 2.928e-14
      OE       DE    Tempo     Luck     OppO     OppD    NCSOS 
1.726822 1.519118 1.951416 1.166057 2.376931 1.608535 3.538177 

From the Added-Variable Plots we can see that there is a linear relationship between almost all of the predictors and our response variable.

Checking the Assumptions of our model


    studentized Breusch-Pagan test

data:  reduced_mdl
BP = 10.267, df = 7, p-value = 0.1739


    Shapiro-Wilk normality test

data:  student_r
W = 0.97412, p-value = 0.6756

Outlier Analysis

   Rk Wins Losses   EM    OE   DE Tempo  Luck   SOS  OppO  OppD NCSOS
45 45   26     15 15.9 114.3 98.4    68 0.014 11.33 112.2 100.8 -4.99
   Rk       Team Conf Wins Losses   EM    OE   DE Tempo  Luck   SOS  OppO  OppD
45 45 N.C. State  ACC   26     15 15.9 114.3 98.4    68 0.014 11.33 112.2 100.8
   NCSOS
45 -4.99

Interesting to see that N.C. State is an outlier. This is a team that went on a historic run to end the season, which could be causing them to become an influential point. Since they are not effecting our assumptions and are not an incorrect data point we will not do anything to remove them.

Column

Exhaustive Model Approach

         OE  DE  Tempo Luck OppO OppD NCSOS R2      AdjR2   Cp        BIC      
1  ( 1 ) "*" " " " "   " "  " "  " "  " "   "0.576" "0.56"  "253.402" "-18.148"
2  ( 1 ) "*" "*" " "   " "  " "  " "  " "   "0.859" "0.848" "69.62"   "-46.697"
3  ( 1 ) "*" "*" " "   "*"  " "  " "  " "   "0.954" "0.949" "9.126"   "-75.9"  
4  ( 1 ) "*" "*" " "   "*"  " "  " "  "*"   "0.966" "0.961" "3.05"    "-81.583"
5  ( 1 ) "*" "*" " "   "*"  " "  "*"  "*"   "0.967" "0.96"  "4.623"   "-78.783"
6  ( 1 ) "*" "*" " "   "*"  "*"  "*"  "*"   "0.968" "0.959" "6.263"   "-75.902"
7  ( 1 ) "*" "*" "*"   "*"  "*"  "*"  "*"   "0.968" "0.957" "8"       "-72.896"

Analyzing what was found to be the best model


Call:
lm(formula = Wins ~ OE + DE + Luck + NCSOS, data = predictive_data)

Residuals:
    Min      1Q  Median      3Q     Max 
-2.2479 -0.7750 -0.1218  0.5757  2.6843 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept) 12.20368    8.07252   1.512  0.14365    
OE           0.72832    0.04445  16.384 1.57e-14 ***
DE          -0.74609    0.05194 -14.364 2.78e-13 ***
Luck        28.63050    3.42921   8.349 1.47e-08 ***
NCSOS       -0.17619    0.05943  -2.965  0.00674 ** 
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 1.144 on 24 degrees of freedom
Multiple R-squared:  0.9664,    Adjusted R-squared:  0.9608 
F-statistic: 172.7 on 4 and 24 DF,  p-value: < 2.2e-16


    studentized Breusch-Pagan test

data:  best_mdl
BP = 6.9124, df = 4, p-value = 0.1406


    Shapiro-Wilk normality test

data:  student_r
W = 0.9607, p-value = 0.342

The model that we found to be the best at predicting the number of wins a team will get is: \[Wins = 12.20368 + 0.73x_{OE} - 0.75x_{DE} + 28.63x_{Luck} - -0.18x_{NCSOS}\]

Ridge regression

Column

Research Question and What is ridge regression?

Question: What is the relation between Wins , SOS and OE, how accurately can predict Wins using SOS and OE by constructing a Ridge model?

Ridge Regression

Ridge regression is a regularization technique(Method in statistics used to reduce error caused by overfitting of data) for linear regression models. Used to get rid of overfitting in training data we use for our model. It is also know as L2 regularization. Problem that is solved using this regression is “Multicollinearity”. In this technique of regilarization we add a bias into the model for decreasing model’s variance.

Residual Sum Squares formula for linear regression is given by  
\(RSS = \sum_{i=1}^{n} (y_i - \hat{y}_i)^2\)

Where:
n is the number of data points in the dataset.
\(y_i\) is the observed value of the dependent variable for data point
\(\hat{y}_i\) is the predicted value of the dependent variable for data point i based on the regression model.  
Where as by adding the regularization term according to Ridge regression we would get  
\(RSS_{ridge} = \sum_{i=1}^{n} (y_i - \hat{y}_i)^2 + \lambda \sum_{j=1}^{p} \beta_j^2\) \(\lambda\) is the regularization parameter (also known as the ridge parameter or penalty parameter) that controls the strength of the regularization.}
\(p\) is the number of predictor variables (features) in the regression model.
\(\beta_j\) represents the coefficients (weights) associated with each predictor variable.

3D Interactive plot of model.

[1] "y = 0.752500495321759 * OE + -0.265035011621413 * SOS + -62.3402694619341"

3D scatter plot of the training data, where the x-axis represents offensive efficiency (OE), the y-axis represents the strength of schedule (SOS), and the z-axis represents the number of wins (Wins). Each data point is represented as a marker in the plot,

Column

Making model equation for ridge regression

           Length Class  Mode     
lambda     100    -none- numeric  
cvm        100    -none- numeric  
cvsd       100    -none- numeric  
cvup       100    -none- numeric  
cvlo       100    -none- numeric  
nzero      100    -none- numeric  
call         4    -none- call     
name         1    -none- character
glmnet.fit  12    elnet  list     
lambda.min   1    -none- numeric  
lambda.1se   1    -none- numeric  
index        2    -none- numeric  

To obtain the equation of the ridge regression model, we first fitted the model using cross-validated ridge regression with the cv.glmnet function in R. This function selects an optimal lambda value through cross-validation.

After fitting the ridge regression model, we extracted the coefficients corresponding to the optimal lambda value. The coefficients represent the weights assigned to each predictor variable in the model.  

The equation of the ridge regression model can be written as follows:

\[ y = \beta_0 + \beta_1 x_1 + \beta_2 x_2 + \ldots + \beta_n x_n \]

Where:
- \(y\) is the dependent variable (e.g., Wins in our case).
- \(\beta_0\) is the intercept term.
- \(\beta_1, \beta_2, \ldots, \beta_n\) are the coefficients corresponding to predictor variables \(x_1, x_2, \ldots, x_n\) respectively.

For our specific ridge regression model, the coefficients and variables are substituted into the equation to form the final equation, which can be written in the form:

\[ \text{Wins} = -61.34 + 0.75 \times \text{OE} - 0.27 \times \text{SOS} \]

R-squarted for this model

Ridge Regression Model:
MSE: 13.07958 
R-squared: 0.6100785 
RMSE: 3.616571 

These values suggest that the ridge regression model is moderately effective in predicting basketball team wins based on offensive efficiency and strength of schedule. However, there is still room for improvement as indicated by the non-zero MSE and RMSE values.

LOESS Fit

Column

Research Question

What is the nature of the relationship between offensive efficiency (OE) and three-pointer percentage (3P%) in basketball, and how does the application of a loess fit compared to traditional linear regression enhance our understanding of this relationship? As the three-pointer percentage increases, offensive efficiency is expected to increase as well, as making three-pointers contributes more points per possession compared to two-point field goals. However, the relationship may not be strictly linear, and a loess fit may capture the non-linear patterns more accurately than a linear regression model.

Fitting a simple linear model to our parameters

Column

Fitting an optimal LOESS Regression line with degree = 1

Plot 1

Summary

Call:
loess(formula = merged_df$OE ~ merged_df$`3P_Percentage`, data = merged_df, 
    span = 0.4843714, degree = 1)

Number of Observations: 362 
Equivalent Number of Parameters: 4.95 
Residual Standard Error: 5.783 
Trace of smoother matrix: 5.82  (exact)

Control settings:
  span     :  0.4843714 
  degree   :  1 
  family   :  gaussian
  surface  :  interpolate     cell = 0.2
  normalize:  TRUE
 parametric:  FALSE
drop.square:  FALSE 

Fitting an optimal LOESS Regression line with degree = 2

Plot 2

Summary

Call:
loess(formula = merged_df$OE ~ merged_df$`3P_Percentage`, data = merged_df, 
    span = 0.6612838, degree = 2)

Number of Observations: 362 
Equivalent Number of Parameters: 5.98 
Residual Standard Error: 5.78 
Trace of smoother matrix: 6.56  (exact)

Control settings:
  span     :  0.6612838 
  degree   :  2 
  family   :  gaussian
  surface  :  interpolate     cell = 0.2
  normalize:  TRUE
 parametric:  FALSE
drop.square:  FALSE 

KNN Classification

Column

Research Question and PCA plot

Question: “How accurately can a K-nearest neighbors (KNN) classifier predict the success level of basketball teams based on their defensive efficiency (DE), strength of schedule (SOS), and tempo?

The levels are divided based on Win-Loss Percentage as follows:

  • “Successful”: > 0.75
  • “Above Average”: 0.5 - 0.75
  • “Average”: 0.25 - 0.5
  • “Below Average”: < 0.25

We are performing \(\textbf{k-Nearest Neighbors (kNN)}\) classification on a dataset with predictors \(\textbf{"SOS" (Strength of Schedule)}\) and \(\textbf{"DE" (Defensive Efficiency)}\), \(\textbf{"Tempo"}\) categorizing teams based on their win-loss percentages into four categories: \(\textbf{“Successful", “Above Average", “Average,"}\) and \(\textbf{“Below Average"}\). It splits the data into training and test sets, trains a kNN classifier with \(\textbf{k=7}\) neighbors, and evaluates its accuracy, providing insights into team categorization based on performance metrics.

SOS DE Tempo team_cat
12.42 91.1 64.6 Successful
11.57 87.7 63.5 Successful
14.65 94.6 67.0 Successful
9.49 92.4 70.0 Successful
13.35 90.2 69.3 Above Average
11.12 93.7 72.2 Above Average
Accuracy of KNN classifier: 0.6849315 

PCA

The above graph is obtained after performing Principal Component Analysis (PCA) on the \(\textbf{SOS}\), \(\textbf{DE}\), and \(\textbf{Tempo}\) variables from the \(\textbf{train_data}\). It converts the PCA results into a data frame and creates an interactive scatter plot using plotly, where each data point represents a team. The plot displays the teams in a two-dimensional space based on the first two principal components (\(\textbf{PC1}\) and \(\textbf{PC2}\)), with color indicating the \(\textbf{team_cat}\) variable (team category) and team names shown as hover text.

Column

K value vs Accuracies

Ploting Accuracy for various values of K

Above plot is a comprehensive evaluation of KNN models with varying numbers of neighbors (K) ranging from 1 to 50. It calculates the accuracy of each KNN model by comparing its predictions on the test dataset against the actual labels.We identify the K value that achieves the highest accuracy and highlights this optimal point in the plot using a distinctive red color

The Results

Confusion Matrix


 
   Cell Contents
|-------------------------|
|                       N |
|           N / Col Total |
|-------------------------|

 
Total Observations in Table:  73 

 
                  | results$Actual 
results$Predicted | Above Average |       Average | Below Average |    Successful |     Row Total | 
------------------|---------------|---------------|---------------|---------------|---------------|
    Above Average |            32 |             6 |             1 |             2 |            41 | 
                  |         0.821 |         0.222 |         0.200 |         1.000 |               | 
------------------|---------------|---------------|---------------|---------------|---------------|
          Average |             7 |            21 |             4 |             0 |            32 | 
                  |         0.179 |         0.778 |         0.800 |         0.000 |               | 
------------------|---------------|---------------|---------------|---------------|---------------|
     Column Total |            39 |            27 |             5 |             2 |            73 | 
                  |         0.534 |         0.370 |         0.068 |         0.027 |               | 
------------------|---------------|---------------|---------------|---------------|---------------|

 
$t
               y
x               Above Average Average Below Average Successful
  Above Average            32       6             1          2
  Average                   7      21             4          0

$prop.row
               y
x               Above Average    Average Below Average Successful
  Above Average    0.78048780 0.14634146    0.02439024 0.04878049
  Average          0.21875000 0.65625000    0.12500000 0.00000000

$prop.col
               y
x               Above Average   Average Below Average Successful
  Above Average     0.8205128 0.2222222     0.2000000  1.0000000
  Average           0.1794872 0.7777778     0.8000000  0.0000000

$prop.tbl
               y
x               Above Average    Average Below Average Successful
  Above Average    0.43835616 0.08219178    0.01369863 0.02739726
  Average          0.09589041 0.28767123    0.05479452 0.00000000
Confusion Matrix and Statistics

               Reference
Prediction      Successful Above Average Average Below Average
  Successful             0             0       0             0
  Above Average          2            32       6             1
  Average                0             7      21             4
  Below Average          0             0       0             0

Overall Statistics
                                          
               Accuracy : 0.726           
                 95% CI : (0.6091, 0.8239)
    No Information Rate : 0.5342          
    P-Value [Acc > NIR] : 0.0006216       
                                          
                  Kappa : 0.4906          
                                          
 Mcnemar's Test P-Value : NA              

Statistics by Class:

                     Class: Successful Class: Above Average Class: Average
Sensitivity                     0.0000               0.8205         0.7778
Specificity                     1.0000               0.7353         0.7609
Pos Pred Value                     NaN               0.7805         0.6562
Neg Pred Value                  0.9726               0.7812         0.8537
Prevalence                      0.0274               0.5342         0.3699
Detection Rate                  0.0000               0.4384         0.2877
Detection Prevalence            0.0000               0.5616         0.4384
Balanced Accuracy               0.5000               0.7779         0.7693
                     Class: Below Average
Sensitivity                       0.00000
Specificity                       1.00000
Pos Pred Value                        NaN
Neg Pred Value                    0.93151
Prevalence                        0.06849
Detection Rate                    0.00000
Detection Prevalence              0.00000
Balanced Accuracy                 0.50000

The cross table compares the actual classes with the predicted classes from a classification model. It contains counts of correct predictions for each class combination:

- “Above Average”: Correctly predicted 33 times out of 42 instances (78.57% accuracy).
- “Average”: Correctly predicted 20 times out of 31 instances (64.52% accuracy).
- “Below Average”: Correctly predicted 4 times out of 4 instances (100% accuracy).
- “Successful”: Correctly predicted 0 times out of 2 instances (0% accuracy).
The total number of instances considered in the table is 73

Naive Bayes Classification

Column

Research Question

How can Naive Bayes classification be utilized to categorize college basketball teams as good, average, or bad 3-point shooting teams based on their three-pointer percentage, considering that we observed a positive relationship between three-pointer percentage and Offensive Efficiency when using LOESS?

Distribution of 3-Point Percentage Categories

Column {data-width = 650}

Naive Bayes Classification

Accuracy: 0.6438356 

Confusion Matrix


 
   Cell Contents
|-------------------------|
|                       N |
|           N / Col Total |
|-------------------------|

 
Total Observations in Table:  73 

 
             | actual 
   predicted |      poor |   average |      good | excellent | Row Total | 
-------------|-----------|-----------|-----------|-----------|-----------|
        poor |         3 |         6 |         0 |         0 |         9 | 
             |     0.600 |     0.146 |     0.000 |     0.000 |           | 
-------------|-----------|-----------|-----------|-----------|-----------|
     average |         2 |        27 |         9 |         0 |        38 | 
             |     0.400 |     0.659 |     0.346 |     0.000 |           | 
-------------|-----------|-----------|-----------|-----------|-----------|
        good |         0 |         8 |        17 |         1 |        26 | 
             |     0.000 |     0.195 |     0.654 |     1.000 |           | 
-------------|-----------|-----------|-----------|-----------|-----------|
Column Total |         5 |        41 |        26 |         1 |        73 | 
             |     0.068 |     0.562 |     0.356 |     0.014 |           | 
-------------|-----------|-----------|-----------|-----------|-----------|

 

PCA Plot

Logistic Regression

Column

Research Question

What is the relationship between a team’s number of wins (Wins) and their likelihood of having an above-average 3-point percentage (3P_Percentage > mean) versus a below-average 3-point percentage (3P_Percentage <= mean)?

Variables Used

Let’s briefly look at the data

  • Wins: The number of wins achieved by a team, likely indicating their overall performance or success in games.
  • 3P_Percentage: The percentage of successful three-point shots made by a team, a measure of their accuracy and skill in long-range shooting.
  • Binary_3P: A binary variable indicating whether a team’s 3P_Percentage is above the mean (1 for “High 3P”) or below/equal to the mean (0 for “Low/Medium 3P”), used as the target variable for logistic regression prediction.

First 6 rows of the dataset

Team Rk Wins 3P_Percentage Binary_3P
Connecticut 1 37 0.358 1
Houston 2 32 0.348 1
Purdue 3 34 0.406 1
Auburn 4 27 0.352 1
Tennessee 5 27 0.344 1
Arizona 6 27 0.366 1

Column

Summary

Summary of our model is:


Call:
glm(formula = Binary_3P ~ Wins, family = binomial, data = train_data)

Coefficients:
            Estimate Std. Error z value Pr(>|z|)    
(Intercept) -3.37232    0.49966  -6.749 1.49e-11 ***
Wins         0.20052    0.02816   7.120 1.08e-12 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 400.36  on 288  degrees of freedom
Residual deviance: 328.48  on 287  degrees of freedom
AIC: 332.48

Number of Fisher Scoring iterations: 4

The logistic regression equation for the model is:

\[ \eta = -3.4 + 0.20 \times \text{Wins} \]

Where: - \(\eta\) (eta) is the linear predictor. - Wins is the predictor variable. - The intercept is -3.34966. - The coefficient for Wins is 0.20159.

Analysis of Deviance Table

Model 1: Binary_3P ~ 1
Model 2: Binary_3P ~ Wins
  Resid. Df Resid. Dev Df Deviance  Pr(>Chi)    
1       288     400.36                          
2       287     328.48  1   71.876 < 2.2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Drop-in-Deviance Likelihood Ratio Test

Assessing the overall goodness-of-fit.

Analysis of Deviance Table

Model 1: Binary_3P ~ 1
Model 2: Binary_3P ~ Wins
  Resid. Df Resid. Dev Df Deviance  Pr(>Chi)    
1       288     400.36                          
2       287     328.48  1   71.876 < 2.2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

the Analysis of Deviance Table suggests that including the predictor “Wins” significantly improves the logistic regression model’s fit for predicting the binary outcome “Binary_3P.” The model with “Wins” as a predictor explains more of the variability in the response variable compared to the null model

Accuracy

Confusion matrix

Confusion Matrix and Statistics

          Reference
Prediction  0  1
         0 20 14
         1 13 26
                                          
               Accuracy : 0.6301          
                 95% CI : (0.5091, 0.7403)
    No Information Rate : 0.5479          
    P-Value [Acc > NIR] : 0.09731         
                                          
                  Kappa : 0.2554          
                                          
 Mcnemar's Test P-Value : 1.00000         
                                          
            Sensitivity : 0.6061          
            Specificity : 0.6500          
         Pos Pred Value : 0.5882          
         Neg Pred Value : 0.6667          
             Prevalence : 0.4521          
         Detection Rate : 0.2740          
   Detection Prevalence : 0.4658          
      Balanced Accuracy : 0.6280          
                                          
       'Positive' Class : 0               
                                          

The confusion matrix and metrics like accuracy (63.01%) and Cohen’s kappa (0.2582) reflect the binary classification model’s performance. Sensitivity (60.00%) and specificity (65.79%) show its ability to detect positive and negative instances accurately. Positive predictive value (61.76%) and prevalence (47.95%) offer insights into prediction accuracy and dataset composition

```